7 research outputs found

    Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Get PDF
    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods

    Human action recognition by embedding silhouettes and visual words

    No full text
    With the availability of cheap video recording devices, fast internet access and huge storage spaces, the corpus of video that is accessible has grown tremendously over the last few years. Processing of these videos to achieve end-user tasks such as video retrieval, human-computer interaction (HCI), biometrics etc. require automatic understanding of content in the video. Human action recognition is one aspect of video understanding that is useful in surveillance, behavioral analysis and HCI. Although this problem has been studied for quite some years now, challenges still exist in terms of cluttered background, intra-class variance and inter-class similarity, occlusion etc. In this thesis, we propose three methods for action recognition. First, we propose a novel embedding for learning the manifold of human actions which is optimum based on spatio-temporal correlation distance (SCD) between sequences. Sequences of actions can be compared based on distances between frames. However comparison based on between-sequence distance is more efficient and effective. In particular, our proposed embedding minimizes sum of distances between intra-class sequences while maximizing sum of distances between inter-class points. Actions sequences are represented by key postures chosen equidistantly from a semantic period of action. The projected sequences are compared based on SCD or Hausdorff distance in a nearest neighbor framework. The method not only outperforms other dimension reduction methods but is comparable to the state of the art on three public datasets. Moreover it is robust to additive noise, occlusion, shape deformation and change in view point up to a large extent. Second, we proposed an approach for introducing semantic relations into the bag-of-words framework for recognizing human actions. In the standard bag-of-words framework, the features are clustered based on their appearances and not their semantic relations. We exploit Latent Semantic Models such as LSA and pLSA as well as Canonical Correlation Analysis to find a subspace in which visual words are more semantically distributed. We project the visual words into the computed space and apply k-means to obtain semantically meaningful clusters and use them as the semantic visual vocabulary which leads to more discriminative histograms for recognizing actions. Our proposed method gives promising results on the challenging KTH action dataset. Finally, we introduce a novel method for combining information from multiple viewpoints. Spatio-temporal features are extracted from each viewpoint and used in a bag-of-words framework. Two codebooks with different sizes are used to form the histograms. The similarity between computed histograms are captured by HIK kernel as well as RBF kernel with Chi-Square distance. Obtained kernels are linearly combined using proper weights which are learned through an optimization process. For more efficiency, a separate set of optimum weights are calculated for each binary SVM classifier. Our proposed method not only enables us to combine multiple views efficiently but also models the action in multiple spaces using the same features, thereby increasing performance. Several experiments are performed to show the efficiency of the framework as well as the constitutive parts. We have obtained the state of the art accuracy of 95.8% on the challenging IXMAS multi-view dataset.DOCTOR OF PHILOSOPHY (SCE

    Human action recognition using pose-based discriminant embedding

    No full text
    Manifold learning is an efficient approach for recognizing human actions. Most of the previous embedding methods are learned based on the distances between frames as data points. Thus they may be efficient in the frame recognition framework, but they will not guarantee to give optimum results when sequences are to be classified as in the case of action recognition in which temporal constraints convey important information. In the sequence recognition framework, sequences are compared based on the distances defined between sets of points. Among them Spatio-temporal Correlation Distance (SCD) is an efficient measure for comparing ordered sequences. In this paper we propose a novel embedding which is optimum in the sequence recognition framework based on SCD as the distance measure. Specifically, the proposed embedding minimizes the sum of the distances between intra-class sequences while seeking to maximize the sum of distances between inter-class points. Action sequences are represented by key poses chosen equidistantly from one action period. The action period is computed by a modified correlation-based method. Action recognition is achieved by comparing the projected sequences in the low-dimensional subspace using SCD or Hausdorff distance in a nearest neighbor framework. Several experiments are carried out on three popular datasets. The method is shown not only to classify the actions efficiently obtaining results comparable to the state of the art on all datasets, but also to be robust to additive noise and tolerant to occlusion, deformation and change in view point. Moreover, the method outperforms other classical dimension reduction techniques and performs faster by choosing less number of postures

    Efficient 2D viewpoint combination for human action recognition

    No full text
    The ability to recognize human actions using a single viewpoint is affected by phenomena such as self- occlusions or occlusions by other objects. Incorporating multiple cameras can help overcome these issues. However, the question remains how to efficiently use information from all viewpoints to increase performance. Researchers have reconstructed a 3D model from multiple views to reduce dependency on viewpoint, but this 3D approach is often computationally expensive. Moreover, the quality of each view influences the overall model and the reconstruction is limited to volumes where the views overlap. In this paper, we propose a novel method to efficiently combine 2D data from different viewpoints. Spatio- temporal features are extracted from each viewpoint and then used in a bag-of-words framework to form histograms. Two different sizes of codebook are exploited. The similarity between the obtained histograms is represented via the Histogram Intersection kernel as well as the RBF kernel with χ2 distance. Lastly, we combine all the basic kernels generated by selection of different viewpoints, feature types, codebook sizes and kernel types. The final kernel is a linear combination of basic kernels that are properly weighted based on an optimization process. For higher accuracy, the sets of kernel weights are computed separately for each binary SVM classifier. Our method not only combines the information from multiple viewpoints efficiently, but also improves the performance by mapping features into various kernel spaces. The efficiency of the proposed method is demonstrated by testing on two commonly used multi-view human action datasets. Moreover several experiments indicate the efficacy of each part of the method on the overall performance
    corecore